National Repository of Grey Literature 4 records found  Search took 0.01 seconds. 
Identifying Entity Types Based on Information Extraction from Wikipedia
Rusiňák, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This paper presents a system for identifying entity types of articles on Wikipedia (e.g. people or sports events) that can be used for identifaction of any arbitrary entity. The~input files for this system are a list of several pages that belong to this entity and a list of several pages that do not belong to this entity. These lists will be used to generate features that can be used for generation of the list of all pages belonging to this entity. The fatures can be based on both structured information on Wikipedia such as templates and categories and non-structured informations found by the analysis of natural text in the first sentence of the article where a defining noun that represents what the article is about will be found. This system support pages written in Czech and English and can be extended to support other languages.
Information Extraction from Wikipedia
Valušek, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes. Several approaches with the use of machine learning will be presented. Furthermore, important features like date of birth in articles regarding people, or area in those about lakes, and many more, will be extracted. With the use of the system presented in this thesis, one can generate a well structured knowledge base, using a file with Wikipedia articles (called dump file) and a small training set containing a few well-classed articles. Such knowledge base can then be used for semantic enrichment of text. During this process a file with so called definition words is generated. Definition words are features extracted by natural text analysis, which could be used also in other ways than in this thesis. There is also a component that can determine, which articles were added, deleted or modified in between the creation of two different knowledge bases.
Information Extraction from Wikipedia
Valušek, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes. Several approaches with the use of machine learning will be presented. Furthermore, important features like date of birth in articles regarding people, or area in those about lakes, and many more, will be extracted. With the use of the system presented in this thesis, one can generate a well structured knowledge base, using a file with Wikipedia articles (called dump file) and a small training set containing a few well-classed articles. Such knowledge base can then be used for semantic enrichment of text. During this process a file with so called definition words is generated. Definition words are features extracted by natural text analysis, which could be used also in other ways than in this thesis. There is also a component that can determine, which articles were added, deleted or modified in between the creation of two different knowledge bases.
Identifying Entity Types Based on Information Extraction from Wikipedia
Rusiňák, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This paper presents a system for identifying entity types of articles on Wikipedia (e.g. people or sports events) that can be used for identifaction of any arbitrary entity. The~input files for this system are a list of several pages that belong to this entity and a list of several pages that do not belong to this entity. These lists will be used to generate features that can be used for generation of the list of all pages belonging to this entity. The fatures can be based on both structured information on Wikipedia such as templates and categories and non-structured informations found by the analysis of natural text in the first sentence of the article where a defining noun that represents what the article is about will be found. This system support pages written in Czech and English and can be extended to support other languages.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.